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Abstract. The ordinal approach to evaluate time series due to innovative works 
of Bandt and Pompe has increasingly established itself among other techniques of 
nonlinear time series analysis. In this paper, we summarize and generalize the theory of 
determining the Kolmogorov-Sinai entropy of a measure-preserving dynamical system 
via increasing sequences of order generated partitions of the state space. Our main 
focus are measuring processes without information loss. Particularly, we consider the 
question of the minimal necessary number of measurements related to the properties 
of a given dynamical system. 


1. Introduction 

Since the invention of permutation entropy by Bandt and Pompe [8] and the proof 
of its coincidence with Kolmogorov-Sinai entropy for piecewise monotone interval maps 
by Bandt et ah in [7], there is some increasing interest in considering time series and 
dynamical systems from the pure ordinal point of view (see Amigo, [4]). The idea 
behind this viewpoint is that much information of a system is already contained in 
ordinal patterns describing the up and down of its orbits. This ordinal view can be 
particularly useful when having physical quantities for which the statement that a 
measuring value is larger than another one is well interpretable, but concrete purely 
given differences of measuring values are not. A prominent example is the (indirect) 
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measurement of temperature as the mean kinetic energy of the particles of a system 
by a thermometer. One can make statements about what is warmer or colder, but, for 
example, the interpretation of an increase by 1°C with not knowing the baseline value 
is complicated. 

This paper is generally discussing the Kolmogorov-Sinai entropy from the ordinal 
viewpoint. It reviews and particularly extends and generalizes former results given by 
Antoniouk et ah [6], Amigo [3], Keller [14], Keller and Sinn [15, 16] and Amigo et ah [5]. 
Aspects of entropy estimation are touched. 

The framework. The basic model of our discussion is a measure-preserving dynamical 
system {Q, A, p.,T), i.e. 11 is a non-empty set whose elements are interpreted as the 
states of a system, ^ is a sigma-algebra on 11, /x : ^ [0,1] is a probability measure, 

and T : H is a ^-measurable //-preserving map describing the dynamics of the 
system, p-preserving means that p,{T~^{A)) = //(A) for all A e A; the measure // is 
then called T-invariant. 

We want to have some kind of regularity of T by assuming at least one of the following 
conditions: 


T is ergodic with respect to //, i.e. 

(1) h(^) ^ {0) 1} for all A G A with r“^(A) = A, 

(2) H can be embedded into some compact metrizable space so that A = 13(11). 

Here and in the whole paper, 13(11) denotes the Borel a-algebra in the case that H is a 
topological space. As usual, equivalent to T is ergodic with respect to //, we say that // 
is ergodic for T. 

Often the states of a system, whatever they are, cannot be accessed directly, but infor¬ 
mation on them can be obtained by measurements. In this paper such measurements 
are assumed to be given via observables Xi, X 2 , X^,... defined as M-valued random 
variables on the probability space (11,A,//). So the measurements are provided by a 
stochastic process - we say seguence of observables X = (Xi)jeM - whose realization 
has components {Xi(T°^{u)))tmo. Here Xi(T°^{oj)) is interpreted as the z-th measured 
value from the system at time t when starting in state a; G 11. 

A priori we have infinitely many observables providing more and more information, 
the finite case, however, is included by equality of all Xp i > n for some n G N. We 
will write X = (Xj)]h;^ in the case of finitely many observables and X = X in the case 
of only one observable X. 

Unless otherwise stated, in the following (H, A, //, T) is a measure-preserving dynam¬ 
ical system and X = (Xj)jgN a sequence of observables. 

Kolmogorov-Sinai entropy. In order to recall the Kolmogorov-Sinai entropy, let g G N 
and V = {Pi, P 2 ,..., Pq} C A be a hnite partition of H, i.e. H = IJLi -H/) Pi ^ 

I = 1,2,..., q, Pi^^^r) P 12 =0 for different li,l 2 G {1, 2,..., g}, and let A = {1,2,..., g} 
be the corresponding alphabet. Each word 0102 ... a* of length t G N defines a set 

Pa,a 2 ...at := {w G H | {u, T(u), . . . , T°^-\u)) E Pa, X Pa^ X ... X 
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and the collection of all non-empty sets obtained for snch words of length t provides a 
partition Vt C A oi ff. In particnlar, Vi = V. 

The entropy rate of T with respect to an initial partition V is given by 

V(T,P)= hm 

i—>-oo t 

where H^{C) denotes the (Shannon) entropy of a hnite partition C = {Ci,C 2 , ■ ■ ■, 
Cq} C A oi h] q E N, i.e. 

1=1 

(with Oln(O) := 0), and the Kolmogorov-Sinai entropy is dehned by 

ftf(T)= sup h,(T,P)^ 

V finite partition 

Althongh the Kolmogorov-Sinai entropy is well-dehned, its determination is not easy. 
In some special cases one can hnd hnite partitions already determining it, nsnally called 
generating partitions (see Dehnition 6.3), however, do not exist or are not accessible. 
As a snbstitnte, we want to consider special seqnences of partitions only depending on 
the ordinal strnctnre of a dynamical system. 


Ordinal partitioning. For a single observable X on {Q, A, yi,T) and s,f G Nq with 
s < t, consider the bisection 


(3) 


vf/ = {{uEn I X{T°%u)) < X{T°\u))}, 
{w G I X{T°%u)) > X{T°\oj))}} 


of and, for observables Xi, X 2 ,..., X„ on (f2. A, p, T) and d,n eN, the partition 


( 4 ) 


i=l 0<s<t<d 


i.e. the coarsest partition rehning all bisections z = 1, 2,... n, 0 < s < f < d. (If 

one of the sets of the right hand side of (3) is empty, Vff" is considered to consist of 
only one set.) 

The partition V) * is called ordinal partition of order d associated to (Xj)(h^. 
By dehnition its parts contain all states with eqnal ordinal measnrement strnctnre for 
an initial orbit part. 


A central statement. Clearly, in order to preserve information of the given system, 
the observables shonld separate orbits of the system in a certain sense. In order to give 
a precise description, let in the following cr((X o be the cr-algebra generated 

At 

by all random variables X* o T°*; z G N, t G No and write X E) Q li for each G E Q there 
exists some F E X with /z(F A G) = 0. 

The following generalization of a statement in Antonionk et ah [6] says that if there 
is no information loss by measnring with observables, all information is preserved also 
by only considering measnrements from the ordinal viewpoint. 
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Theorem 1.1. Let {Q, A, fi,T) be a measure-preserving dynamical system and X = 

fl 

(Xj)jgpi be a sequence of observables such that cr((X o D A. Assume that (1) 

or (2) holds. Then 


(5) 



hm 

a,n—>oo 


sup hAT,V^ ). 

d,nEN 


When Bandt and Pompe [8] invented the permutation entropy, they considered one¬ 
dimensional systems with coincidence of states and measurements. This fits into the 
given general approach as follows: 12 is a Borel subset of M and only one observable is 
considered to be the identity map id from 12 into M. In this situation the assumptions 
of Theorem 1.1 are satished and so it holds 

hf{T) = hm h,{T,Vf^) = suph,(r,Pf^) 

d-^oo 

(compare [15, 16]). 


Structure of the paper. The paper is organized as follows. In Section 2 we provide 
a proof of Theorem 1.1 on the basis of Antoniouk et al. [6]. We, moreover, discuss 
this statement from different perspectives in Section 3 by presenting its modifications 
and variants. Section 4 is devoted to the concept of permutation entropy, in particular 
to the two different approaches to it given by Bandt et al. in [7] and Amigo et al. in 
[5], respectively, and to its relation to the Kolmogorov-Sinai entropy. The ordinal 
approach to dynamical systems opens new perspectives to the estimation of system 
complexity. Advantages and limitations of this approach are discussed in Section 5. The 
natural question of how many observables are necessary for satisfying the assumptions 
of Theorem 1.1 is in the focus of Section 6. The corresponding discussion is strongly 
related to Takens’ delay embedding and similar ideas (see Takens [23] and Sauer [22]). 


2. Kolmogorov-Sinai entropy from the ordinal viewpoint 

This section is devoted to the proof of Theorem 1.1. 

Preliminaries. In the following we write iF = Q ii D Q and G Q, and denote 
by 1a the indicator function of a subset A C hi. Moreover cr(<C>) denotes the cr-algebra 
generated by a set <C> of subsets of hi, by a sequence or double sequence <C> of sets of 
subsets of 12, or by a random variable <0 on 12. 

Given two hnite partitions C A of 12, we write C -< P if "D is finer than C or, 

equivalently, if C is coarser than V, that is, each element (7 G C is a finite union of 

some elements of P. Note that -< on the set of hnite partitions of 12 contained in A is 
a partial order. 

The join of m G N hnite partitions Cr = {Cr^\ C^\ ..., C A of 12 

with r = 1,2,..., m is the coarsest partition rehning all r = 1, 2,..., m, i.e. 

m m 

y Cr = {p| 7 ^ 0 I G {1,2,..., \Cr\} for r = 1,2,... ,m}. 

r=l r=l 
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For an observable Y on [Q, A, fi,T) we consider the finite partitions 

■= V -d 75 j.- := v pyf 

0<s<t<d 0<t<d 

(compare (3)) for d E N and the cr-algebras and generated from all and 
d eN, respectively. 

Besides (compare (4)), for d,n G N we are interested in the 

finite partitions 

n 

( 6 ) 

i=l 

Fnrthermore, we need the following a-algebras associated to these partitions: 


and 


: = 


a 


V. 


{Yi)"=i,T 



gX.T 


:= a 




d,n€N 



The proof. Althongh we consider dynamical systems eqnipped with infinitely many 
observables, we can follow closely the argnmentation in the paper Antonionk et ah [6]. 
So let us first recall or modify those statements of that paper used in our proof. 

Lemma 2.1. [6, Lemma 3.2] Let F : R —)■ [0,1] be the distribution function of an 
observable X, that is F{a) = E 12 | X{u) < a}) for all a G R. Then 

a{FoX) =a{X). 

Lemma 2.2. [6, Lemma 3.3] Let T : Q be an ergodic map and /ef : 12 —?• R be 
defined by Id{uj) := l{Y(T°*(w))<A(a;)} for all d E N and a; G 12. Then 

F{X{ui)) = lim for a.e. u E 12. 

d-^oo d 

By very slight modifications we can extend [6, Corollary 3.4 and Corollary 3.5] to 
countably many observables: 


Corollary 2.3. Let T : Vt be an ergodic map. Then 

a(X) C C 


Proof. Compare to [6, Corollary 3.4]. The cr-algebra is generated by the cr-algebras 
j]Xi,T a{{\/i E N. Therefore by cr(Xi) C for all z G N it 

follows the assumption. This is true since ^ : 12 —)■ [0,1] is S^’^-i3([0, l])-measurable 
for all c2 G M and hence so is FoX and X by Lemma 2.1 and Lemma 2.2. The inclusion 
C is given by construction (compare (4) and (6)). □ 
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Corollary 2.4. Let T : Vt ^ he an ergodic map. Then 

a((XoT°%Mo) 

Proof. For fixed n e N, in [6, Proof of Corollary 3.5] it is shown that 

(7) pX^oT,T for all d G N and z = 1, 2,..., n 

implying 

(8) Y,XioT°\T ^ Y,Xi,T all z = 1, 2,..., n and t E Mq. 

Moreover, Corollary 2.3 gives 

(9) (t(X o T°*) C for all t E Nq. 

Consequently, a{{'X o T°^)t(zno) C □ 

Lemma 2.5. is an increasing seguence in n for fixed d, and for fixed 

n it is an increasing seguence in d. 

In particular, {V^. " )dj,njm is an increasing seguence in j if and 

are increasing seguences in N. 

Proof. Given d, rz G N, it holds 


' d 

=v 

V 


i=l 

0<s<t<d 

' d+l 

n 

=v 

V 


i=l 

0<s<£<c/+l 

' d 

n+1 


= v 

V ^5’" 


i=l 

0<s<t<d 

^{x.) 

) ' d 

Tl-\~ 1 rjn 

’ and so the above 


implying -< ’ and so the above statements. □ 

For completing the proof of Theorem 1.1, we apply the following statement (see 
Walters [27, Theorem 4.22]): 

Lemma 2.6. For a seguence {Cd)dm of finite partitions Cd E A of Q increasing with 

fib 

respect to -< and satisfying o'{{Cd)dm) P> A, it holds 

hf(T) = Jim ft„(r,C,), 

^ d-A-oo 


First suppose that T is an ergodic map. Then under the assumptions of Theorem 
1.1 and by Corollary 2.4 it holds A C cr((X o C Since by Lemma 

2.6 (pr-"='Di,. nj£N is an increasing sequence in j with respect to -< for increasing 
sequences {dfijtzfq and {nfijtzfi in N, the assertion of Theorem 1.1 follows from Lemma 

2 . 6 . 

In the non-ergodic case the ergodic decomposition theorem is consulted. For a thor¬ 
ough treatment we refer the reader to Einsiedler and Ward [10] and Einsiedler et ah [9]. 
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In particular, the ergodic decomposition theorem claims that under certain conditions 
any T-invariant measure fi can be decomposed into ergodic components and subse¬ 
quently the entropy rate as well as the Kolmogorov-Sinai entropy of T with respect to 
can be written as the integral of the entropies with respect to the decomposition. 

In order to complete the proof of Theorem 1.1, we apply the following statement (see 
Einsiedler and Ward [10, Theorem 6.2], Einsiedler et ah [9, Theorem 5.27] and Keller 
and Sinn [15] for the case of a non-invertible T): 


Theorem 2.7. Let {Q,A,fi,T) be a measure-preserving dynamical system satisfying 
(2). Then there exists a probability space and a map u* i—)■ associating 

to each oj* E fl* a probability measure on (hi, .4.) such that the following is valid: 

n* can be embedded into some compact metrizable space so that A = the map 

w* e IT -)■ / dfXuj* is A*-B(R) -measurable for every essentially bounded measurable 

function / : hi —)■ M, the measure is ergodic T-invariant for u-a.e. u* G fl*, and 


fi= ia^*du{u*). 




KS 


Moreover, it holds 

( 10 ) 

Jn* 

and 

(11) h^(T,V) = / h^^, {T,V) dz^(a;*) for each finite partition V G A of fl. 




Altogether we obtain 

ftf(r) 


(w) 

Theorem 1.1 
ergodic case 

monotone 


/ hfjT)du{u*) 
'n* 

[ hm 


lim / v.(r.py'>*--^)dKi^-) 

convergence >-oo J 

^—>■00 

Here and {dj)j^^ are strictly increasing sequences of natural numbers. 


3. Modifications and conseqences of Theorem 1.1. 

We want to have a closer look at Theorem 1.1. For this recall that X o T°^ can be 
interpreted as a measurement of a system at time t. As discussed in Section 1, there 
is no information loss when taking a pure ordinal viewpoint in the case that these 
measurements have ‘separating properties’. 

Less comparisons. The main Theorem 1.1 can be given in a relaxed version if the 
considered observables provide a ‘separation’ from the outset (compare also [16, 17]). In 
order to determine the Kolmogorov-Sinai entropy, this means, in the case of ‘separating’ 
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original observables, one does not need all comparisons between the elements of an orbit 
but only comparisons between points and their iterates. 


Theorem 3.1. Let {fl, A, n,T) be a measure-preserving dynamical system and X = 

he a sequence of observables such that cr(X) D A. Assume that (1) or (2) 
holds. Then 



lim K(T,vT'"'L 

a,n^oo 


7 /rji 

sup hAT,V^ ). 

d,nEN 


For an ergodic map T we have that A C which follows from Corollary 2.3 and 

the assumption cr(X) D A. Moreover {V^ ' )d,n&N is an increasing sequence in d 
and N with respect to as it can be shown analogical to the proof of Lemma 2.5. 
Thus, for T ergodic the assertion follows by Lemma 2.6. To show the non-ergodic case 
one can use the the ergodic decomposition theorem as in the proof of Theorem 1.1. 

fl 

It seems that the assumption cr(X) D M in Theorem 3.1 cannot be replaced by the 

fl 

assumption cr((X o D M in Theorem 1.1. At least, the argumentation of the 

proof of Corollary 2.4 cannot be adapted. Whereas 

cr(X o T°*) C for all t E No 

is true as (9) is, the analogue 

pX^oT,T for all d G N and z = 1, 2,..., n 

of (7) is false. Therefore the analogue 

gA,or°*,T ^ alH = 1,2,..., n and f e No 


of (8) is not guaranteed. Let us give an example. 


Example 3.2. Let = [0,1] and T : be dehned by 


Tiu) 


2u for a; < i 
2 — 2a; else 


(T is the tent map preserving the equidistribution on [0,1].) Let 


^ — 2 • l[o, 1/3] + 3 ■ l]l/3, 2/3] + l]2/3,1]) 

a;i = 1 and a ;2 = |. Then 

(y(r°*(a;i))igi,o = (l,2,2,2,2,2,...), 

(y(r°*(a;2))i6No = (l,2,3,3,3,3,...). 

It follows that a;i and 0 J 2 are separated by Pq ’ and hence for all ' ] d E N, but 

are not separated by Pj’^ for all d G N. Consequently, Pj°^’^ 7 ^ '^d+i all d G N and 
I E No. 
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Other partitions. For a single observable X on a measure-preserving dynamical sys¬ 
tem (f2, A, /r, T) and s, t G Mq with s < t, let 

Qff = {{u; e I X{T°^{u)) > X{r\u))}, 

{w e I X{T°^{u)) < X{T°\u))}} 

and 

= {{a; e I X{T°^{i0)) < X{T^\io))}, 

{w e I X{T°%u)) > X{T°\u))}, 

{w e I X{T°^{uj)) = X{T°\uj))}}. 


Further, for observables Xi, X 2 ,..., Xn on (hi, A, p, T) and d E N, let 


( 12 ) 


Q(X,)f.,.T 


V V 

i=l 0<s<t<d 


and 

n 

(13) 7jy‘'f-'’' = \/ V -R^f. 

i=l 0<s<t<d 

(If one of the sets of the right hand side of (12) or (12) is empty, then it is not considered 
in order to have only nonempty sets.) Then the following is valid: 


Corollary 3.3. The statement of Theorem 1.1 remains true when substituting 
by 

Proof. Application of Theorem 1.1 to —X = (—Xj)jgi^ provides 


/iy(T)= lim KiT.V'i 




d,n^oo 


= K(t, q 




Moreover, each is finer than implying 


Therefore 


h^^{T)> lim > lim /i^(T, /i|^S(T). 

^ d,n^oo d^n^oo ^ 


The existence of the limit 


lim ft„(r.7jy‘>f-'’') 

a,n^oo 


and its coincidence with the corresponding supremum is obvious (compare discussion 
for in Section 2). □ 

Let us consider an order -< between observables A, F by X -< F iff for all Wi, 0)2 G fl 
the following holds (compare [3]): 

b^(<^i) < ^( 1 ^ 2 ) implies A(ci;i) < A(ci; 2 ). 

One easily shows the following: 


10 


K. KELLER, S. MAKSYMENKO AND 1. STOLZ 


Lemma 3.4. For X -<Y it holds -< . 

Note that for X -< L not generally -< and -< After the 

following corollary being an immediate consequence of Theorem 1.1, we will illustrate 
this point by an example. 


Corollary 3.5. Let {Q, A, ^,T) be a measure-preserving dynamical system and X = 

(Xj)igi!} he a sequence of observables with Xi -< X2 -< X3 -< ... and a‘({X o T°'}tgj^Q) D 
A. Assume that (1) or (2) holds. Then 

ftf (T) = lim = sup 

d,i^oo d,ieN 


Example 3.6. See Example 3.2 and let 


X - 2 - 1 [0,5/8] + 1]5/8,1] 


and 


Y — A ■ l[o, l/8]U[3/8,5/8] + 3 • 1]1/8,3/8[ + l]5/8, Ij- 
Obviously, X -<Y. Let oji = \ and a;2 = |. Then 

(X(T°‘(a;0)teNo = (2, 2,1, 2, 2, 2, 2, 2, 2,...), 

(X(T°^(a;2))teNo = (l,2,l,2,2,2,2,2,2,...), 

(y(T°*(n;0)teNo = (3,4,l,4,4,4,4,4,4,...), 

and 

(y(T°*(n; 2 ))igNo = (l,4,l,4,4,4,4,4,4,...). 

From this, on one hand it follows that u)i and u)2 are separated by Pg i i i-®- 
different elements of Vq^, hence are separated by for all d G N. On the other 
hand, this implies that uji and u )2 are not separated by ’ for all d 6 M. 

Therefore for no d G N the partition is finer than . The similar is true for 
and since for an observable Z on {Q, A, ii,T). 


Remark 3.7. Each hnite partition C = {Ci, ( 72 ,..., Cg} G A; q E N is generated 
by observables of the form X = ' Iq in the sense that Ci = X~^{ai) for all 

I = 1,2,..., q, where ap, I = 1,2,... ,q are different real numbers. If a partition T> C A 
is hner than C, than it can be written as 

1 

V = \J{Df'>\j = l,2,...,mi} 

i=i 

with mi, m 2 , ..., rUg G N and Ci = [j^li 

If X = ^21=1 ■ Ic; for different a; G N and if m > mi for all / = 1, 2,..., g, then for 

q mi 

y = ■ m + j) 1 (i) 

l=l 3=1 

it holds X -<Y. This shows that an increasing sequence {Cd)den can be ‘generated’ by 
a sequence {Xd)deK of observables with Xi -< X 2 -< X 3 -< ... . 
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4. Permutation entropy 

The idea of considering dynamical systems from the ordinal viewpoint is strongly 
related to the invention of the permutation entropy, which we want to discuss now. We 
hrst give a dehnition of it in our general framework: 

Definition 4.1. Given a sequence X = (Xj)jgisj of observables on a measure-preserving 
dynamical system (G, .4., p, T), we dehne the permutation entropy h^{T, X) with respect 
to X by 

(14) h*{T, X) = lim limsup ^ H^{T, pf 

n^oo CL 

Originally, by Bandt et ah in [7] the dehnition of permutation entropy was given 
directly for one-dimensional systems. In our framework, this is /i*(T, id) with T being 
an interval map. 

Permutation and Kolmogorov-Sinai entropy. One reason for investigating the permu¬ 
tation entropy is its close relationship to the well-established Kolmogorov-Sinai entropy 
hrst observed by Bandt et ah in [7]. In their seminal paper they have shown that both 
entropies are coinciding for piecewise monotone interval maps T, i.e. for selfmaps T on 
intervals splitting into hnitely many subintervals on which T is continuous and mono¬ 
tone. 

/i 

Moreover, in the case that cr((X o r°*)igNp) D A and that (1) or (2) holds, the 
Kolmogorov-Sinai entropy is not larger than permutation entropy. It holds for hnitely 
many observables 

lim < limsup 3 for all n G N 

>00 d—)-oo d 

(see Keller et al. [18, Corollary 3]), hence the corresponding inequality for inhnitely 
many ones follows by n approaching to inhnity. So let us summarize: 

Corollary 4.2. Let {Q, A, p,,T) be a measure-preserving dynamical system and X = 

CL 

(Xj)jgi^ he a seguence of observables such that cr((X o D A. Assume that (1) 

or (2) holds. Then 

ftf(r)<y(r,x). 

The approach of Amigo et al. [3, 5]. This approach to permutation entropy diherent 
to the original is based on a rehning sequence of hnite partitions and is justihed by the 
following statement due to Amigo et al. [3, 5]. We express the statement by hnite-valued 
observables and refer here to Remark 3.7. 

Theorem 4.3. For a measure-preserving dynamical system {Q, A, p,T) the following 
is valid: 

(i) If X is a finitely-valued observable, and V the finite partition generated by X, 
then 


h,{T,V) = h;{T,X). 
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(ii) If (Xj)jgN is a sequence of finitely-valued observables with Xi -< X 2 -< X 3 -< ... 
and the corresponding sequence of finite partitions generates A, then 


' 17 — 


(15) 


One immediately sees that by Lemma 2.6 assertion (ii) follows directly from state¬ 
ment (i). Amigo et al. took the right hand side of (15) as their modified concept of 
permutation entropy before showing its equality to Kolmogorov-Sinai entropy. 

We want to finish this section by stating the following general problem, which is inter¬ 
esting on the different levels from the original one-dimensional dehnition of permutation 
entropy to the generalization for finitely or infinitely many observables. 

Problem. Are the Kolmogorov-Sinai entropy and the permutation entropy coinciding 
and, if not, under which assumptions? 

Note that the pure combinatorial part of the problem is relatively well understood 
(see Unakafova et al. [26], Keller et al. [18]). 


5. Ordinal time series analysis 


Ever since the idea of Bandt and Pompe [8] to consider the rank order of consecutive 
values of a time series instead of the values themselves, the ordinal approach attracts 
increasing attention and is applied in many scientific fields, for example in biomedical 
research, engineering and econophysics (see Amigo et al. [1, 2], Zanin et al. [28] and the 
references given there). 

The reason is that the ordinal viewpoint brings with it many advantages especially for 
measuring complexity, such as robustness against small noise, simplicity of application 
and interpretation, and low computational costs. As mentioned, the determination of 
Kolmogorov-Sinai entropy is usually not easy, our discussion above, however, suggests 
that the ordinal approach can be used as a framework for estimating the Kolmogorov- 
Sinai entropy of dynamical systems and suchlike from real world data. 

In the following we consider the theory developed in the previous sections in an 
applied context and discuss the pro and cons of using this approach in view of studying 
long and complex time series. A detailed exposition of this ordinal pattern approach is 
provided in Keller et al. [17]. 

Ordinal patterns. The task of gaining information about an underlying system via 
measurements is a common everyday problem. As already mentioned, this issue is 
increasingly addressed by using information lying in the ordinal structure of a system 
or a time series obtained from it. This leads to considering the up and downs in a time 
series, which can be described via so-called ordinal patterns. 

Definition 5.1. For d G N denote the set of permutations of {0, 1,..., d} by 11^. We 
say that a real vector (x<j)f^Q has ordinal pattern tt = (tto, tti , , vr^) G 11^ of order d if 



and 


(16) 


TTu-i > TTu if ^ for any ue {1.2.... ,d}. 
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Given a time series {xt)tmoj ordinal pattern of order d at time t is defined as that 
of denoted by tt*. 

Example 5.2. In Figure 1 we consider a time series of 50 data points where exemplary 
the ordinal pattern ttiq = (0, 5, 3,4, 6 ,1, 2) G fie is emphasized, which corresponds to 
the order relation of the six successive values at f = 10, that is 

Xt ^ ^ ^i+3 ^ ^ ^ ^ t 10. 



Figure 1. Illustration of an ordinal pattern of order d = Q assigned to 
six successive values (plotted in the vertical direction) of a time series of 
50 data points. 


It is easily seen that, following the framework given in Section 1, two states wi G G 
and 012 G G belong to the same part of some ordinal partition iff the ordinal 

patterns of the vectors 

{XM),MT{ooi)), ■ ■ .,X{T°‘^{u 2 ))) and (X,(a; 2 ), X,(T(a; 2 )),.. .,X{T°\u 2 ))) 

coincide. Clearly, the other previous considered partitions (see Equations (6), (12) and 
(13)), despite some adjustments in terms of equality, can be coherent assimilated to this 
ordinal approach by redehning ordinal patterns in terms of the equality of values. The 
setting (16) is here in some sense arbitrary, however, the proposed dehnition of ordinal 
patterns has established itself. We will use it in the following to demonstrate how 
the previous covered theory provides interesting and promising tools for extracting the 
information saved in an ordinal pattern sequence or suchlike, for example, by estimating 
the permutation entropy (see Equation (14)) or by approximating the Kolmogorov-Sinai 
entropy. 

In order to utilize ordinal patterns for the analysis of a system, sequential data {xt)tm 
obtained from a given measurement are transformed into a series ordinal 

patterns. Distributions of ordinal patterns obtained from this approach are the central 
objects of exploration. 

Note that ordinal patterns do not provide a symbolic representation as it is usually 
considered, since partitions of the state space are not given a priori, but are created on 
the basis of the given dynamics. However, the ordinal patterns as ‘symbols’ are very 
simple objects being directly obtained from the orbits of the system and containing 
intrinsic causal information. For the relationship of symbolic dynamics and represen¬ 
tations and ordinal time series analysis see Amigo et ah [1]. 
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For simplicity, we now restrict our exposition to the one-dimensional case with only 
one measurement. What we have in mind is a measure preserving dynamical system 
/i,T), where is a Borel subset of R, acting as the model of a system, with a 
single observable X being the identity map. The extension of the ideas to the general 
case is obvious. 

Estimation of ordinal quantities. The naive and mainly used estimator of ordinal 
pattern probabilities, so of the probability of the ordinal partition parts, is the relative 
frequency of ordinal patterns in an orbit of some length. For some t,d G M, some 
ordinal pattern tt of order d and some u &VL the estimation is given by the number 

p, = \ #{s e {0,1,d} I (X(r”(w)), x{T°‘+\uj)). 

t — d + 1 

... has ordinal pattern tt}. 

Here f-|- 1 is the length of the considered orbit of u. Clearly, the estimation only makes 
sense in the ergodic case. Then, by Birkhoff’s ergodic theorem, the corresponding 
estimator is consistent. 

If in the ergodic case all £ H^ are determined, it follows immediately that in 

the simple case considered a reasonable estimator for (14) is given by the empirical 
permutation entropy of order d eN: 

TTGlId 

It gives furthermore also some information on the Kolmogorov-Sinai entropy. 

Assets and drawbacks. Irrespective of the considered ordinal partition, the ordinal 
approach brings along some practical advantages and disadvantages. Note that most 
difficulties to overcome are common to any sort of time series analysis. 

Considering the order relation between the values of a time series, small inaccuracies 
in measurements (e.g. errors between the state of a system and its observed value) 
are mostly negligible. Hence, the methods considered are relatively robust towards 
calibration differences of measuring instruments. Furthermore, the ordinal approach 
is easily interpretable and there already exist efficient methods to perform an ordinal 
time series analysis in real time. For a deeper discussion we refer to Riedl et ah [20] 
as well as Unakafova and Keller [25]. Last but not least, a foreknowledge of the data 
range when analyzing data is usually not necessary. 

In contrast, the ordinal analysis of time series can be rather poor if the underlying 
system is so complex that such a large value d is needed that the computational capacity 
is insufficient. If, for example, the permutation entropy of a dynamical system is very 
large, its estimation by the empirical permutation entropy is problematic. Note that 
generally also for simple systems the convergency of empirical permutation entropies 
of order d to the permutation entropy can be rather slow, which is the reason for 
considering a conditional adaption of the permutation entropy (see Unakafov and Keller 
|24]), 
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In addition, the choice of a suitable order d with respect to the length of the original 
time series is affected by common problems. Large values of d are needed to evaluate 
encapsulated information as accurate as possible but a large d grants (d + 1)! possible 
ordinal patterns which have to be considered if nothing is known about the original 
time series. If one chooses an overlarge d relative to the length of a time series, it can 
happen that not all ordinal patterns which are substantial for describing the underlying 
dynamics are observed in the ordinal pattern distribution or suchlike. This is known as 
undersampling. 

Moreover, ordinal time series analysis can lead to an arbitrary poor approximation 
of the Kolmogorov-Sinai entropy or poor representation of the underlying dynamics by 
the statistics, especially while working on wrong assumptions, e.g. a given system fails 
to be ergodic or the chosen observables cause information loss while measuring. The 
next section alludes to the latter problem. 

6 . Algebra reconstruction dimension 

Theorems 1.1 claims that the Kolmogorov-Sinai entropy of T can be computed pro¬ 
vided that we have sufficiently many observables “generating” A up to p-measure zero. 
Essential for applications, the natural question arises how we can decrease the number 
of observables as much as possible. In this section we briefly review the known results 
in this direction. 

Only one observable. The following example shows that theoretically in most real 
cases we can End only one such observable. 

Example 6.1. Let I = [0,1], Z be a separable complete metric space (such spaces 
are called Polish), hi C Z be its uncountable Borel subset, and A := B{Ot) be the 
Borel cr-algebra of hi. Then the pair {VL,B{Vt)) is called a standard Borel space. It is 
well known, e.g. see Kechris [13, Proposition 12.1], that then there exists a measurable 
isomorphism of (r2,;B(r2)) onto the space (/,i3(/)), that is a bijection X : hi —)■ / such 
that X-\B{I)) = B{Q). 

Let p be a measure on (hi, B{Q) and T : hi —)■ hi be any /i-preserving map. Then 

B{n) D a((X o T°*)ieNo) 3 a(X) = X-\B{I)) = BiO), 

that is (t((X o T°*)ieNp) = B{0L). Moreover, as every separable metric space Z can 
be embedded into a Hilbert cube being a compact space, compare to Hurewicz and 
Wallmann [11, Chapter V, §5, Theorem V4], we see that condition (2) holds for fl G Z 
as well, and therefore by Theorem 1.1 the Kolmogorov-Sinai entropy h^^{T) of T can 
be computed via the formula (5). 

Notice that the function X : H —)■ [0,1] C M from Example 6.1 is not in general 
continuous and its explicit construction is very complicated. Therefore it is not useful 
for real applications. This leads to the following notion. 

Definition 6.2. Let (H, B{Q)) be a standard Borel space with measure fi on B{Q), and 
T : H —>■ H be a H(r2)-H(r2)-measurable map. By the algebra reconstruction dimension 
of T with respect to /x we will mean the minimal integer number n > 1 such that there 
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exists a continuous map X : —)■ M” satisfying 

(17) 

This number will be denoted by ard^(T). If such n does not exist, then we will assume 
that ard^(T) = oo. 


Thus ard^(T) is the minimal number of continuous observables needed to approxi¬ 
mate the Kolmogorov-Sinai entropy via (5). 

Given a map T : G —)■ G, a map X : G —)■ R” and t E N one can dehne the following 
t-reconstruction map 

Ax,T,t = (X, X o T,..., X o : G ^ R^* 

and an oc-reconstruction map 

Ax.t,oo = (X, X o T, X o ...); G ^ R-. 


Evidently, Ax,r,i = X, 
and 


a((Xor-)^) 


^■(Ax.r,*), 


cr(X) C cr(Ax,r,t) cr(Ax,T,t+i) C cr(Ax,T,oo); t G N. 
In particular, (17) can be reformulated as follows: 

(18) c’‘(Ax,r,oo) 


Before discussing ard^(T) we will present an example for the existence of one separat¬ 
ing observable, that is X ; G —)■ R satisfying (18), and therefore allowing to approximate 
the Kolmogorov-Sinai entropy by formula (5), see Theorem 6.5 below. However, now 
this observable is “discrete”, i.e. it takes at most countable many values. 


Definition 6.3. Let {Q, A, ii,T) be a measure-preserving dynamical system. An at 
most countable partition C = C Al of G for some g G N U {oo}, is called 

generating with respect to T, if 

a((T-‘C),eNo)=Al, 

where 

The following lemma is evident. 


Lemma 6.4. Suppose a measure-preserving dynamical system has a gen¬ 
erating partition C = g G N U {oo}. Define a function X : G —)■ R 

X = E?.. ' ■ ic, (compare Remark 3.7). Then o'(X) = (t{C), whence 

o'(Ax,t,oo) = o'((X o T°*)tgNo) = <^((7" *^7)tgNo) — Al¬ 
in general, a /i-preserving map does not have a generating partition. Nevertheless, 
for non-singular ergodic automorphisms of standard probability spaces such partitions 
do exist, what we discuss now. First we recall necessary definitions. 

Let (G, A, p) be a probability space. The measure p is called complete if for any 
subset A E A with p{A) = 0 every its subset B also belongs to A. 

A countable family of sets {Ai};gN C Al is called a complete basis of {fl,A,p) if 
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(a) for each A E A there exists a. B E with A <Z B and n{B \ v4) = 0; 

(b) for any 0 Ji, 0 J 2 eVL there exists an I eN such that Ui E Ai and UJ 2 E \ 

(c) each intersection HieN where every Bi is either Ai or Q\ Ai, is non-empty. 

A probability space is called standard if it has a complete basis and fi is 

complete. 

It has been proved by Rohlin [21] that every standard probability space with non- 
atomic measure is isomorphic with the probability space (/,i3(/),A), where A is the 
Lebesgue measure on I. 

Recall also that a one-to-one transformation T : —)■ is non-singular with respect 

to a measure n if it is bi-measurable, i.e. T~^A = A and TA = A, and ia{A) = 0 if and 
only if fi{T{A)) = 0 for all A E A. 

The following theorem is a consequence of results by Rohlin [21], Parry [19] and 
Krieger [12] about the existence of countable and hnite generating partitions of ergodic 
maps. 

Theorem 6.5. [21, 19, 12] Let fi) be a standard probability space, and T : 

Q ^ Q be a non-singular ergodic pi-preserving map. Then VL has a countable generating 
partition with respect to T. Hence there is a discrete measurable function X : —)■ M 
taking at most countable distinct values and satisfying (18). 

Moreover, if h^^{T) < 00 , then T admits a hnite generating partition, and so X can 
be assumed to take only finitely many distinct values. 

The continuous case. Notice that the function X from Theorem 6.5 is slightly better 
than the one from Example 6.1, as it takes a discrete set of values mutually distinct for 
distinct elements of the generating partition C. Nevertheless, it is hard to construct as 
it requires to know a generating partition for T, and so it is not useful for application 
as well. 

Now we will consider the opposite situation when almost any continuous map X : 
—>■ M" satishes (18). 

Lemma 6.6. Let fl be a Polish space admitting an embedding X ; kl —)■ M”. Then for 
any measure /i on B{fl) and any p-preserving map T, we have that ard^(T) < n. In 
particular, i/dimk2 = k; k E M, then ard^(T) <2k-\-l. 

Proof. Since X is an embedding, we obtain that ct(X) = X“^(i3(R"')) = B{Q), whence 
<7 (Ax,t,oo) = B{fl) as well. 

The second statement follows from the well known fact that every fc-dimensional 
separable metric space kl can be embedded into [11, Chapter V, §4, Theorem 

V3]. Moreover, by the same theorem the set of embeddings Emb(k2, is residual 

(and, in particular, dense) in the space (7(12, of all continuous maps. Therefore 

almost every family of 2 A; -|- 1 continuous observables will allow to approximate the 
Kolmogorov-Sinai entropy of T. □ 

The next statement is a slight generalization of Theorem 2.2 from Keller [14]. 

Theorem 6.7. Let Q be a smooth manifold and P{Q) be the group of its C°° diffeo- 
morphisms. Then there exists a residual subset W ofV{kl) such that ard^(r) = 1 for 
each T eW and any measure pi preserved by T. 
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Proof. Let dimfi = k. For each n G N let 

Sn = {(X, T) e R) X V{n) \ Ax,r,n : ^ is an embedding}. 

Tims if (X,T) 6 Sn, then ard^(r) = 1. 

It is proved by Takens [23] that if n > 2A:+1, then Sn is residnal (and in particnlar non¬ 
empty and everywhere dense) in C'°°(r2,M) x Thns we have that £^ 2 fc+i = 

where each Ui is open and everywhere dense in the space C'°°(r2,M) x V{Q). Let 
p : C'°°(r2,M) X 12(12) —I2(f2) be the natnral projection, i.e. p(X, T) = T. It is a 
standard fact from general topology that p is an open map, whence 

OO 

w = p{s2k+i) = []pm 
1=1 

is a residnal snbset of V{fl). Then ard^(T) = 1 for each T E W and any measnre p 
preserved by T. □ 

Notice that the latter resnlt does not gnarantee that for any measnre p on B(Q) pre¬ 
served by some diffeomorphism T there exists some other ^u-preserving diffeomorphism 
T' with ard^(T') = 1. 

The following notion allows to decrease the dimension 2k + l in Lemma 6.6 by pntting 
some restrictions on p. 

Definition 6 .8. Let X : hi —)■ i? be a continnons map between topological spaces. 
Then the following snbset of hi 

N^ = {uen \ X-^(X(a;)) ^ {co}} 

will be called the set of non-injectivity of X. 

Lemma 6.9. (Antonionk et ah [6, Theorem 4.2]) Let X. : Q ^ R be a continuous map 
between Polish spaces and p be a measure on B{Vt). Suppose there exists a Borel subset 
D such that Nx C D and p{D) = 0. Then cr(X) = B{^d). 

Let n be a smooth manifold of dimension k. Say that a snbset Q <Z Ll has Lebesgue 
measure zero, if for any local chart (f : fl D U —)■ in hi the set (j){Q fl U) has Lebesgne 
measnre zero in R^. Notice that there is no natnral definition of a set of fixed positive 
Lebesgue measure. 

A measnre p on B{kl) will be said Lebesgue absolutely continuous if p{Q) = 0 for 
each snbset Q C 12 of measnre zero. 

Theorem 6.10. (Antonionk et ah [6, Theorem 2.13]) Let Ll be a smooth manifold of 
dimension k and p be a Lebesgue absolutely continuous measure on B(Q). For each 
n eN let 

Vn = {XE C''^(f2,R") I Nx E B{Q), p{Nx) = 0}. 

If n > k, then Vn is residual in (^““(n,R""). Hence ard^(r) < k 1 for any (not 
necessarily continuous) p-preserving map T : 12 —)• 12. 
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6.1. Comparison of results. It is convenient to compare these resnlts in the following 
table, where it is assnmed that is a Polish space of dimension k. 


n 

P 

T 

ard/,(T) 

Statement 

Borel 

space 

any measnre 

any /i-preserving 
measurable map 

<2k + l 

Lemma 6.6 

Smooth 

manifold 

Lebesgue 

absolntely 

continnous 

any p-preserving 
measurable map 

<k + l 

Theorem 6.10 

Smooth 

manifold 

any measure 
preserved by 
T 

generic 

diffeomorphism 

1 

Theorem 6.7 
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